File Formats for Data Storage in Data Analysis

CSV: Simple and Universal

CSV is ideal for data analysis when you have tabular data that is not too complex or nested,
Supported by many software and prog languages

  • Excel, Python and R

JSON: Flexible and Structured

Stores data as "objects"
Can handle complex and nested data structures
-arrays, lists, and dictionaries

Work with data that is not easily represented in a table

XML: Rich and Extensible

Stores data as "elements, attributes, and text"
Can handle metadata, schemas, and namespaces

Ideal for data that is highly structured and hierarchical

Q: Now the question becomes how do we decide which format should we use based on the raw data that we have. Okay, perhaps, I need to learn first how to do a data analysis